Incremental OPTICS: Efficient Computation of Updates in a Hierarchical Cluster Ordering
نویسندگان
چکیده
Data warehouses are a challenging field of application for data mining tasks such as clustering. Usually, updates are collected and applied to the data warehouse periodically in a batch mode. As a consequence, all mined patterns discovered in the data warehouse (e.g. clustering structures) have to be updated as well. In this paper, we present a method for incrementally updating the clustering structure computed by the hierarchical clustering algorithm OPTICS. We determine the parts of the cluster ordering that are affected by update operations and develop efficient algorithms that incrementally update an existing cluster ordering. A performance evaluation of incremental OPTICS based on synthetic datasets as well as on a real-world dataset demonstrates that incremental OPTICS gains significant speed-up factors over OPTICS for update operations.
منابع مشابه
Dynamic Local Feature Selection in Incremental Clustering
In this paper we describe a preliminary study into the use of feature selection in incremental hierarchical clustering. Our aim is to add this capability to the clustering system, still maintaining the in-cremental nature of the learning process. This constraint lead us to consider a dynamic feature selection mechanism which is performed parallel to the clustering process. In addition, feature ...
متن کاملAn Incremental Approach to Building a Cluster Hierarchy
In this paper we present a novel Incremental Hierarchical Clustering (IHC) algorithm. Our approach aims to construct a hierarchy that satisfies the homogeneity and the monotonicity properties. Working in a bottom-up fashion, a new instance is placed in the hierarchy and a sequence of hierarchy restructuring process is performed only in regions that have been affected by the presence of the new ...
متن کاملIncremental Shared Nearest Neighbor Density-Based Clustering Algorithms for Dynamic Datasets
Dynamic datasets undergo frequent changes where small number of data points are added and deleted. Such dynamic datasets are frequently encountered in many real world applications such as search engines and recommender systems. Incremental data mining algorithms process these updates to datasets efficiently to avoid redundant computation. Shared nearest neighbor density based clustering (SNN-DB...
متن کاملBatch Incremental Shared Nearest Neighbor Density Based Clustering Algorithm for Dynamic Datasets
Incremental data mining algorithms process frequent updates to dynamic datasets efficiently by avoiding redundant computation. Existing incremental extension to shared nearest neighbor density based clustering (SNND) algorithm cannot handle deletions to dataset and handles insertions only one point at a time. We present an incremental algorithm to overcome both these bottlenecks by efficiently ...
متن کاملIncremental parallel and distributed systems
Incremental computation strives for efficient successive runs of applications by reexecuting only those parts of the computation that are affected by a given input change instead of recomputing everything from scratch. To realize the benefits of incremental computation, researchers and practitioners are developing new systems where the application programmer can provide an efficient update mech...
متن کامل